Mining strongly correlated item pairs in large transaction databases

نویسندگان

  • Swarup Roy
  • Dhruba Kumar Bhattacharyya
چکیده

Correlation mining is an approach of drawing statistical relationship between items from transaction data. Most of the existing techniques use Pearson’s correlation coefficient as a measure of correlation, which may not always perform well when data are noisy and binary in nature. Moreover, they require multi-pass over the database. This paper presents an effective and faster correlation mining technique to extract most strongly correlated item pairs from large transaction databases. As an alternative to Pearson’s correlation coefficient, it presents a method of computing Spearman’s rank order correlation coefficient from transaction data. The proposed technique found to perform satisfactorily in terms of execution time over several real and synthetic datasets, while comparing to other similar techniques. To justify its usefulness, an application of the proposed technique for extracting yeast genetic network from gene expression data is also reported.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Support Based k most Strongly Correlated Item Pairs in Large Transaction Databases

Support confidence framework is misleading in finding statistically meaningful relationships in market basket data. The alternative is to find strongly correlated item pairs from the basket data. However, strongly correlated pairs query suffered from suitable threshold setting problem. To overcome that, top-k pairs finding problem has been introduced. Most of the existing techniques are multi-p...

متن کامل

Mining top-k strongly correlated item pairs without minimum correlation threshold

Given a user-specified minimum correlation threshold and a transaction database, the problem of mining strongly correlated item pairs is to find all item pairs with Pearson's correlation coefficients above the threshold. However, setting such a threshold is by no means an easy task. In this paper, we consider a more practical problem: mining top-k strongly correlated item pairs, where k is the ...

متن کامل

A FP-Tree Based Approach for Mining All Strongly Correlated Pairs without Candidate Generation

Given a user-specified minimum correlation threshold and a transaction database, the problem of mining all-strong correlated pairs is to find all item pairs with Pearson's correlation coefficients above the threshold . Despite the use of upper bound based pruning technique in the Taper algorithm [1], when the number of items and transactions are very large, candidate pair generation and test is...

متن کامل

Itemset Mining Based on Cofactor Implication

(Abstract) In this paper, we propose a new method for discovering hidden information from large-scale transaction databases by considering a property of cofactor implication. Cofactor implication is an extension or generalization of symmetric itemsets, which has been presented recently. Here we discuss the meaning of cofactor implication for the data mining applications, and show an efficient a...

متن کامل

Tree Based Space Partition of Trajectory Pattern Mining For Frequent Item Sets

Transaction Data base (TD) is an extension of frequent item set mining in large static of data mining field. The dynamic and continuous evolving nature of data base requires up hMinor algorithm, hCount and lossy coun explosion of patterns. Fixed window length and decay factor are required to implement the explosion model. The scanning and the support evaluation for item set are fast. Hence, the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IJDMMM

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2013